[Redshift] ウィンドウ関数:FIRST_VALUE、LAST_VALUEで文字列の最小値・最大値を取得
はじめに
最小値と最大値を取得する関数にはMINとMAXが存在しますが、文字列で実現するにはFIRST_VALUEとLAST_VALUEを使用します。
環境
MacOSX 10.10.5 Yosemite Redshift 1.0.1125
テーブルの準備
ミュージシャンのツアースケジュール:tour_schedule
CREATE SCHEMA IF NOT EXISTS blog; CREATE TABLE IF NOT EXISTS blog.tour_schedule ( id INTEGER , artist VARCHAR(32) , day DATE , Country VARCHAR(16) , City VARCHAR(16) , Venue VARCHAR(24) , PRIMARY KEY(id) ); INSERT INTO blog.tour_schedule VALUES (1,'Dimitri From Paris','2016-12-02','Kanada','Tronto','NEST') , (2,'Dimitri From Paris','2016-12-03','USA','New York','House Of Yes') , (3,'Timmy Regisford','2016-12-04','USA','New York','Output') , (4,'Dimitri From Paris','2016-12-17','Japan','Tokyo','Sound Museum Vision') , (5,'Timmy Regisford','2016-12-22','Japan','Kansai','Club Move') , (6,'Timmy Regisford','2016-12-24','Japan','Hokkaido','Precious Hall') , (7,'Timmy Regisford','2016-12-30','Japan','Chubu','Club JB''s') , (8,'Dimitri From Paris','2017-01-13','UK','London','The Nest');
testdb=# SELECT * FROM blog.tour_schedule ORDER BY id; id | artist | day | country | city | venue ----+--------------------+------------+---------+----------+--------------------- 1 | Dimitri From Paris | 2016-12-02 | Kanada | Tronto | NEST 2 | Dimitri From Paris | 2016-12-03 | USA | New York | House Of Yes 3 | Timmy Regisford | 2016-12-04 | USA | New York | Output 4 | Dimitri From Paris | 2016-12-17 | Japan | Tokyo | Sound Museum Vision 5 | Timmy Regisford | 2016-12-22 | Japan | Kansai | Club Move 6 | Timmy Regisford | 2016-12-24 | Japan | Hokkaido | Precious Hall 7 | Timmy Regisford | 2016-12-30 | Japan | Chubu | Club JB's 8 | Dimitri From Paris | 2017-01-13 | UK | London | The Nest (8 rows)
構文
Amazon Redshift | FIRST_VALUE および LAST_VALUE ウィンドウ関数 ウィンドウ関数の構文の概要
FIRST_VALUE | LAST_VALUE ( expression [ IGNORE NULLS | RESPECT NULLS ] ) OVER([ PARTITION BY expr_list ] [ ORDER BY order_list frame_clause ])
frame_clauseの記述は任意です。
MIN/MAXとの比較
目的
アーティスト名ごとの日程順に並び替えして、最初と最後の場所を表示する。
MIN/MAXの場合
SELECT artist , "day" , venue , MIN(venue) OVER(PARTITION BY artist ORDER BY "day" ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) , MAX(venue) OVER(PARTITION BY artist ORDER BY "day" ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM blog.tour_schedule ORDER BY artist, "day";
. artist | day | venue | min | max --------------------+------------+---------------------+--------------+--------------- Dimitri From Paris | 2016-12-02 | NEST | House Of Yes | The Nest Dimitri From Paris | 2016-12-03 | House Of Yes | House Of Yes | The Nest Dimitri From Paris | 2016-12-17 | Sound Museum Vision | House Of Yes | The Nest Dimitri From Paris | 2017-01-13 | The Nest | House Of Yes | The Nest Timmy Regisford | 2016-12-04 | Output | Club JB's | Precious Hall Timmy Regisford | 2016-12-22 | Club Move | Club JB's | Precious Hall Timmy Regisford | 2016-12-24 | Precious Hall | Club JB's | Precious Hall Timmy Regisford | 2016-12-30 | Club JB's | Club JB's | Precious Hall (8 rows)
何を基準としているのか分かりませんが、最大値と最小値が一致していません。
FIRST_VALUE/LAST_VALUEの場合
SELECT artist , "day" , venue , FIRST_VALUE(venue) OVER(PARTITION BY artist ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) , LAST_VALUE(venue) OVER(PARTITION BY artist ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM blog.tour_schedule ORDER BY artist, "day";
. artist | day | venue | first_value | last_value --------------------+------------+---------------------+-------------+------------ Dimitri From Paris | 2016-12-02 | NEST | NEST | The Nest Dimitri From Paris | 2016-12-03 | House Of Yes | NEST | The Nest Dimitri From Paris | 2016-12-17 | Sound Museum Vision | NEST | The Nest Dimitri From Paris | 2017-01-13 | The Nest | NEST | The Nest Timmy Regisford | 2016-12-04 | Output | Output | Club JB's Timmy Regisford | 2016-12-22 | Club Move | Output | Club JB's Timmy Regisford | 2016-12-24 | Precious Hall | Output | Club JB's Timmy Regisford | 2016-12-30 | Club JB's | Output | Club JB's (8 rows)
SQLはMIN/MAXとほぼ同じですが、結果が正しく表示されました。 PARTITION BY でアーティスト名で仕切り、 ORER BY で並び替えの対象カラムに日付を指定、 ROWS でレコード全体を範囲として指定しています。
応用してみる
目的
2016年12月15日〜2016日12月24までの9日間の日程と次の予定日と場所を表示する。
クエリ
SELECT * FROM ( SELECT artist , "day" , venue , LAST_VALUE("day") OVER(PARTITION BY artist ORDER BY "day" ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_day , LAST_VALUE(venue) OVER(PARTITION BY artist ORDER BY "day" ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_venue FROM blog.tour_schedule ORDER BY artist, "day" ) WHERE "day" BETWEEN '2016-12-15' AND '2016-12-24';
. artist | day | venue | next_day | next_venue --------------------+------------+---------------------+------------+--------------- Dimitri From Paris | 2016-12-17 | Sound Museum Vision | 2017-01-13 | The Nest Timmy Regisford | 2016-12-22 | Club Move | 2016-12-24 | Precious Hall Timmy Regisford | 2016-12-24 | Precious Hall | 2016-12-30 | Club JB's (3 rows)
OVER 句の frame_clause で現在の行〜次の行を指定しています。 また、4行目からのSELECTのWHEREで "day" BETWEEN '2016-12-15' AND '2016-12-24' と指定するとその範囲でしかLAST_VALUEの値が取得できないので、一度全体のレコードから作成してから、WHEREで範囲を指定しています。
さいごに
基本的な使い方だけではなく、自分なりに応用的なクエリを作成してみると挙動が良く分かると思います。